Comparing Ranking-based and Naive Bayes Approaches to Language Detection on Tweets

نویسندگان

  • Pablo Gamallo
  • Marcos García
  • Susana Sotelo
  • José Ramom Pichel Campos
چکیده

This article describes two systems participating to the TweetLID-2014 competition focused on language detection in tweets. The systems are based on two different strategies: ranked dictionaries and Naive Bayes classifiers. The results show that ranking dictionaries performs better with small training corpora whose language distribution is similar to that of the test dataset, while a Naive Bayes algorithm improves the scores with large models even if the data are unbalanced with regard to the test dataset. The experiments also showed that the models based on word unigrams outperform the use of n-grams of characters. In the final evaluation the Naive Bayes classifier got the first position among the unconstrained systems (trained with external sources) participating in the competition.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparing Approaches to Subjectivity Classification: A Study on Portuguese Tweets

In this paper, we compare lexicon-based and machine learning-based approaches to define the subjectivity of tweets in Portuguese. We tested SentiLex and WordAffectBR lexicons, and Sequential Machine Optimization and Naive Bayes algorithms for this task. In our study, we used the Computer-BR corpus that contains messages about the technology area. We obtained better results using the Comprehensi...

متن کامل

SAIL: Sentiment Analysis using Semantic Similarity and Contrast Features

This paper describes our submission to SemEval2014 Task 9: Sentiment Analysis in Twitter. Our model is primarily a lexicon based one, augmented by some preprocessing, including detection of MultiWord Expressions, negation propagation and hashtag expansion and by the use of pairwise semantic similarity at the tweet level. Feature extraction is repeated for sub-strings and contrasting sub-string ...

متن کامل

Sentence Boundary Detection for Social Media Text

The paper presents a study on automatic sentence boundary detection in social media texts such as Facebook messages and Twitter micro-blogs (tweets). We explore the limitations of using existing rule-based sentence boundary detection systems on social media text, and as an alternative investigate applying three machine learning algorithms (Conditional Random Fields, Naïve Bayes, and Sequential ...

متن کامل

#WarTeam at SemEval-2017 Task 6: Using Neural Networks for Discovering Humorous Tweets

This paper presents the participation of #WarTeam in Task 6 of SemEval2017 with a system classifying humor by comparing and ranking tweets. The training data consists of annotated tweets from the @midnight TV show. #WarTeam’s system uses a neural network (TensorFlow) having inputs from a Naïve Bayes humor classifier and a sentiment analyzer.

متن کامل

Comparing Experiential Approaches: Structured Language Learning Experiences versus Conversation Partners for Changing Pre-Service Teacher Beliefs

Research has shown that language teachers’ beliefs are often difficult to change through education.  Experiential learning may help, but more research is needed to understand how experiential approaches shape perceptions. This study compares two approaches, conversation partners (CONV) and structured language learning experiences (SLLE), integrated into a course in language acquisition. Partici...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014